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ABSTRACT 

Results of a Hong Kong survey, described in an 
earlier report, are summarized here. The study investigated the 
English vocabulary size of native-speaking adults (n=78) , non-English 
native-speaking adults— Chinese (n=20) , and non-English native 
Chinese-Speaking non-Chinese (n=9) . The vocabulary used reflected 
British rather than American English usage. Data were gathered on 
respondent age and educational and language backgrounds. Each 
respondent also checked off the words he knew from among 150 
vocabulary items. The instruments used are appended. Results indicate 
that the most important factor in receptive vocabulary is being a 
native speaker. Age and exposure to English were also found to be 
important, but the advantages of age were offset by extent of formal 
education. Gender appeared to play no role. An appendix contains the 
questionnaire, 7 tables, anc 10 references.) (MSE) 
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INVESTIGATING LEXIS BEYOND THE MOST FREQUENT WORDS ■ 
PART 2 



Norman Bird 



1. Introduction 

In the introduction to the AILA review on Vocabulary Acquisition Carter (1989:5) 
noted with the support of a sizeable bibliography that: 

The past decade has seen a considerable expansion of interest in vocabulary 
studies ... It can now be claimed that vocabulary is no longer a victim of 
discrimination by researchers who for a considerable period of time deemed 
syntax to be the sole core of processes of language development. 

This interest in vocabulary acquisition has continued into the nineties and the one 
year since the 8th ILE Conference in 1992 has seen some notable advances. For 
example, in Hong Kong alone I.S.P. Nation gave a public lecture in City 
Polytechnic in November 1992 and his book Teaching and Learning Vocabulary 
became generally more available in the following year; 1993 also saw the 
publication of Pemberton, R. and Tsang, E.S.C. (eds.) Studies in Lexis, the working 
papers from the Second Seminar in Lexis held in the Language Centre, The Hong 
Kong University of Science and Technology (HKUST), and in June the third 
seminar on the same subject became a joint seminar on corpus linguistics and 
lexicology held in HKUST and the Guangzhou Institute of Foreign Languages. 

After many years of comparative neglect, it seems that at last vocabulary has 
come to be recognized as one of the primary linguistic resources whereby meaning 
is encoded for the purpose of communication. In view of these recent 
developments, therefore, it is reasonable that reliable test instruments should be 
developed to measure the size of language learners' vocabularies as a step towards 
developing more efficient ways and means of increasing word , ower. It was with 
this in mind that the survey described in the first part of this paper was carried out, 
and it is hoped that the preliminary and necessarily limited findings described in 
this part of the paper will contribute to producing a still more efficient measuring 
instrument in the future. 

The rationale, references, research methods and resultant problems for this 
research are described in full tn Part One (Bird 1993). In brief, Part One describes 
an attempt to replicate the research described in Goulden, Nation and Read (1990) 
How Large Can a Receptive Vocabulary Be?, and produce a series of 50-word 
vocabulary tests of similar design in which each column of 10 words measures 
incrementally the mastery of 5,000 words in terms of frequency. The purpose of 
this research is: 

1. to replicate and check the procedures previously used; 
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2, to use The Oxford English Dictionary, 2nd Edit. (1989) (0ED2), as the 
resource corpus instead of Webster's Third New International Dictionary 
(1961) and 9,000 Words (1983), since 0ED2 is both more up-to-date and 
reflects British rather than American English; 

3. to produce materials that can be used in further research. 



As a part of this research a questionnaire was produced consisting of the 
following: 

1. a personal profile of the respondents, both native speakers (NS) and non- 
native speakers (NNS) regarding their mother tongue, age, formal 
qualifications and sex; for (NNS) of English an extra question concerning 
the number of years of exposure to English including time spent at school 
learning the subject is also asked; 

2. one test from Goulden, Nation and Read (1990); 

3. two tests attempting to replicate the Goulden, Nation and Read test, but 
based on OED2 (1989) and not Webster (1961 and 1983). 



2. Responses to the questionnaire 

Since the eighth ILE conference a total of 288 questionnaires were completed, 
returned and analyzed. The results in terms of the personal profile are presented in 
Table 1. 



Table 1 

Analysis of Questionnaire According to Personal Profile 

a. English Native Speakers (NS) (total 78) 



Quals. 


Yrs. 


Yrs. 


Yrs. 


Yrs. 


Total 




20+ 


30+ 


40+ 


50+ 




Ph.D. 


0 


2 


2 


3 


7 


M.A. 


0 


9 


19 


11 


39 


B.A. 


5 


9 


7 


1 


22 


Others 


2 


4 


0 


4 


10 
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b. Non-English Native Speakers NNS (Chinese) (total 201) 



Quals. 


Yrs. 


Yrs. 


Yrs. 


Yrs. 


Total 




20+ 


30+ 


40+ 


50+ 




Ph.D. 


0 


0 


0 


0 


0 


M.A. 


3 


9 


3 


1 


16 


B.A. 


9 


43 


22 


1 


75 


Others 


41 


38 


28 


3 


110 



c. Non-English Native Speakers NNS (Non-Chinese) (total 9) 



Quals. 


Yrs. 


Yrs. 


Yrs. 


Yrs. 


Total 




20+ 


30+ 


40+ 


50+ 




Ph.D. 


0 


0 


1 


1 


2 


M.A. 


1 


1 


1 


0 


3 


B.A. 


0 


0 


1 


0 


1 


Others 


1 


0 


1 


1 


3 



3. General Analysis of the Questionnaires 

For the purposes of analysis, the rules of procedure described below were laid 
down after the first gross analysis of the returns. 

3.1. Any particular category should contain at least 10 returns, otherwise the 
results should be ignored except as a description of general trends. As a 
consequence, the responses from NNS (non-Chinese) (Table lc) arc not included 
in the analysis for this research paper. 

3.2. In order to limit the effect of variations in scores between tests th-r two 
following procedures were followed in the case of the analysis: 

1 . The two top scores out of 50 were added to give the general score out of 1 (X) 
for the general analysis, and the third score was ignored 

2. As 87% of the NNS found the first 30 items of Test 1 to be more difficult 
than those in Tests 2 and 3, test results of the remaining 13% were ignored 
for the purposes of the detailed analysis. 
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3.3. As two discriminating factors in the personal profile (sex (cf. 4.4) and holding 
a Ph.D. vs. M.A.) were found to have no apparent effect on the test results, these 
two factors are ignored for the purpose of tills small-scale research. 



4. Detailed Analysis of the Questionnaires 

The personal profiles were sorted and analyzed with respect to the following 
criteria: mother tongue, age, qualifications, sex and exposure to English (non-native 
speakers only). 

4.1 . Mother tongue was found to be the most crucial factor in ranking the results. 
The bar between NS and NNS results was 70%. One NNS speaker scored above 
this figure - namely, a 40+ year-old B.A. holder who had lived for over 20 years 
in the English-speaking world. Two NS scored below 70%; one obtained a score 
of 69% ma' finally below the 70% value; the other with 61% was a 30+ year-old 
Eurasian, neither of whose parents was a NS, who was brought up entirely in Hong 
Kong and, who had no formal qualifications beyond school certificate. 

4.2. Age was found to be an important factor in the results both in the case of NS 
and NNS. Receptive vocabulary size remains stable or increases throughout that 
part of life measured in this research (cf. Table 2). The 2% variation between the 
scores of NS aged 30+ vs. 40+ years is assumed to be statistically insignificant. 



Table 2 

Cc mparative Analysis of Selected Tests 
With the Single Variable of Age (and hence Experience) 

a. Constant features: Chinese - B.A. - Female. 

b. Constant features: Native-speakers of English - M.A. 



Type 


Age/Exp. 
20+ (20)* 

9? 


Agc/Exp. 
30+ (25) 

% 


Age/Exp. 
40+ (25) 
% 


Age/Exp. 
50+ (25) 
% 


a. 


40 


44 


50 




b. 




82 


80 





*NB The figures in brackets (e.g. (20)) indicate the number of years of 
exposure to English. 

4.3. Formal qualifications were also found to be an important factor in the scores 
of both NS and NNS, as can be seen from Tables 3 and 4. 
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Table 3 

Comparative Analysis of Selected Tests 
With the Single Variable of Formal Qualifications 
(Chinese Speakers of English) 

Constant features: Chinese - Age (Experience) - Female. 



Type 


Age/Exp. 
20+ (20) 


Age/Exp. 
30+ (25) 


Age/Exp. 
40+ (25) 


Age/Exp. 
50+ (25) 


T.C.* 


32 


36 


40 




B.A. 


40 


44 


50 





*TC = Teacher's Certificate 

In the case of NNS the most interesting feature to emerge from Table 3 is that 
the scores of 40+ year-old T.C. holders and 20+ year-old B.A. holders are identical. 
This suggests that holding the higher qualification of B.A. is equal to approximately 
20 years of experience derived no -loubt from the different quality of exposure to 
English that degree holders enjoy. 

Table 4 

Comparative Analysis of Selected Tests 
With the Single Variable of Formal Qualifications 
(Native Speakers of English) 

Constant features: Native-speaking English - Age. 



Type 


Age/Exp. 
20+ (20) 

% 


Age/Exp. 
30+ (25) 
% 


Age/Exp. 
40+ (25) 

% 


Age/Exp. 
50+ 

% 


B.A. 


11 


78 


82 




M.A. 




82 


80 
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Within the limits of this admittedly small sample composed solely of graduates, 
formal qualifications appear less important to native speakers than non-native 
speakers. In the case of NS holding a Ph.D. it was found that their scores did not 
differ from those holding an M.A., and consequently for the purposes of this 
research the factor of Ph.D. vs. M.A. was ignored. 

4.4. Differences in sex were not found to affect scores. A small-scale comparison 
of the tests of two groups of 10 Chinese B.A. holders (aged 30-39) in which the 
only distinguishing feature was sex, showed a difference of only 0.2%; as this 
figure is of no statistical significance, the sex factor was ignored in attempting to 
form homogeneous groups for comparative purposes in this small-scale research. 



5. Analysis of Tests 

As mentioned in 3.2.2 the first general analysis of the returns revealed that 87% of 
the NNS found the first 30 items of Test 1 more difficult than those in Tests 2 and 
3. 

5.1. A detailed analysis of the 3 tests (10 Chinese respondents) is found in 
Taole 5. 



Table 5 

Comparative Analysis of the results for Tests 1-3 
From a Single Homogeneous Group of Chinese Respondents 



Constant features: Chinese - Age: 40+ years - Exposure to English: 25+ years - 
Qualifications: B.A. - Profession: Teachers (ELT) - Sex: Both 
Males and Females. 



Test 


O 1-10 


0 11-20 


O 21-30 


0 31-40 


0 41-50 


1 


95 


52 


18 


16 


09 


2 


100 


82 


48 


21 


13 


3 


98 


79 


41 


13 


07 


Av. 

% 

Tests 
2 & 3 


99 


80.5 


44.5 


18 


10 
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Table 5 shows considerable discrepancies between the scores in Test 1 (Goulden 
et al.) and Tests 2 and 3 (Bird) especially in the first three columns (amounting to 
30 items measuring the mastery of the 15,000 most frequent words). 

Further investigation reveals that according to Thorndike and Lorge (1944) certain 
test words appear in inappropriate columns, as follows: 

1. Column 1 (1-5,000 most frequent words): homage (6,000), colleague (7,000); 

2. Column 2 (5,001-10,000 most frequent words V atrophy, broach, con, halloo, 
marquise, stationery, woodsman (beyond the 10,000 most frequent words). 



5.2. A detailed analysis of the 3 tests (10 NS respondents) is found in Table 6. 

Table 6 also shows considerable discrepancies between the scores in Test 1 
(Goulden et al.) and Tests 2 and 3 (Bird). In the case of NS these discrepancies arc 
not visible in the first two columns, as the test-words are known by virtually all the 
respondents. The scores for Test 1 in columns 3 and 4 are contrary to common 
sense, and suggest that this particular group knows the frequency group of words 
(15,000+) better than (10,000+). 



Table 6 

Comparative Analysis of the results for Tests 1-3 
From a Single Homogeneous Group 
of Native English-speaking (NS) Respondents 

Constant features: NS - Age: 40+ years* - Qualifications: B.A. - Sex: M & F. 

*3 respondents are 30+ but obtained the same median score as the 7 
respondents in the 40+ group. 



Test 


Q 1-10 


Q 11-20 


0 21-30 


Q 31-40 


Q 41-50 


1 


100 


97 


69 


85 


53 


2 


100 


100 


99 


77 


43 


3 


100 


100 


96 


63 


44 


Av. % 
Tests 
2 & 3 


100 


100 


97.5 


70 


43.5 
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As Thorndike and Lorgc (1944) is rather dated and based largely on school 
materials, the entries in Test 1 were also checked against the frequencies given in 
Hofland and Johansson (1982), the LOB list. It must be remembered, however, that 
LOB is based on a corpus of only one million running words of British English; 
furthermore the list has not yet been lemmatized, and thus measures the frequency 
of graphemes and not lexemes. LOB must, therefore, be used with extreme caution 
as an aid to obtaining an indicator of the order and trends in the frequency list of 
English words, but not as a statement of absolutes. A comparative analysis of the 
words in Test 1 and 2 (columns 2 and 3, i.e. lexeme frequency 5,001-15,000) is 
given in Tabic 7. 

As mentioned in 3.2.2, in order to avoid problems arising out of variations in the 
difficulty of the three papers, and as 87% of the NNS found that the first 30 items 
of Test 1 to be more difficult than those in Tests 2 and 3, test results of the 
remaining 13% were ignored for the purposes of the detailed analysis. 

This variation between tests affected NS less than NNS, i.e. only 68% found 
Tests 2 and 3 easier than Test 1, due in part to the fact that column 4 was easier 
than column 3. As a result of this discrepancy, it was decided to discontinue the use 
of Test 1 in future surveys, and for this research to use it with care, and only when 
necessary, if the size of an individual sample population to be studied was less than 
10. 



Table 7 



Comparative Analysis of the Frequency According to LOB 
of the Words in Tests 1 and 2 (Columns 2 and 3) 



Test and 
Column 


Word and Frequency per Million 


1 - 2 


shrew, atrophy, con, halloo, marquise, woodsman (0), 
avalanche, firmament, broach (1) stationery (3). 


2 - 2 


swine, chink, ooze, filth (1), tributary (2), surf (4), 
idol (6), terminal (13), stationary (15), potential (42). 


1 - 3 


bastinado, countermarch, furbish, meerschaum, 
patroon, cu Tide, wcta, biocnvironmcntal (0), regatta 
(4), asphyxiate (7). 


2 - 3 


misdemeanour, libertine, complicity, plethora (0), 
tentacle, dynamo, masticate/mastication, argent (1), 
whiff (2), disruption (3). 




6. Conclusions 



The 3 tests confirm what many may have long suspected but have not necessarily 
seen formally measured, namely that the most important factor in receptive 
vocabulary is being a NS. Age and hence exposure to English are also clearly 
important, but the 'advantages of age' are offset by the possession of formal 
educational qualifications and all mat this implies; this factor is of greater 
importance to NNS (Chinese) than it is to NS. Difference in sex appears to play no 
role as a determining factor, and some may choose to discount it in similar research 
in the future. 



7. Implications 

Implications from the results of these tests should only be drawn with extreme 
caution. There is evidence that the receptive vocabularies of NS are general larger 
than those of NNS, but possessing a large receptive vocabulary is probably nothing 
more than an indicator of extended exposure 'he language especially in the early 
years of life when the mind is at its most receptive. This advantage that NS have 
over non-native speakers can be compensated for in part by age, formal 
qualifications and extended periods of exposure to the language, and it is 
encouraging to observe that education, especially in the form of studying for 
university degrees, demonstrably builds up the receptive lexical resources that non- 
NS have at their disposal; the extent to which this is possible, however, can only 
be deduced with difficulty from the results of this piece of small-scale research 
comprising less than 300 scores, and the tests must clearly be made more 
sophisticated and given under carefully controlled conditions to larger samples of 
the population of NNS. 

The test also raised two particularly interesting questions, namely: 

1. What guarantee does the tester have that the respondent is answering the 
questions honestly? 

2. What is meant by 'knowing a word'? 



The aaswer to the first question is that the tester can never be absolutely certain 
that tests are being answered honestly, but this does not necessarily invalidate all 
them. Firstly, respondents gain no benefit from answering the questions dishonestly, 
as they are informed that the test is carried out for statistical purposes only, and if 
respondents so wish, they can remain completely anonymous. Furthermore, as 
certain patterns begin to emerge in the answers, as more and more papers are 
tallied, unlikely answers rapidly become apparent, and coascquently, this potential 
problem soon ceases to be a real problem at all. 
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The second question as defining the concept of 'knowing a word' is discussed at 
length in Nation (1990:30-32); from his Table 3.1 (p.31) it is clear that, if we 
consider the 18 features included listed there, only two features are addressed by 
the tests considered in this paper, namely: 

1 . Written form R* What does the word look like? 

2. Meaning R What does the word mean? 

R* = (receptive vocabulary), i.e. not productive vocabulary. 



One final point of interest may be worthy of mention heie, although ii does not 
appear in the test results. It was observed that among those tested who scored 
approximately 65%, i.e. many NS and a few NNS, the question was often raised 
as to whether a word could be regarded as 'known' if, although it had never been 
encountered before, it was felt the meaning of the word would create no problem, 
if it were encountered in a proper reading context, e.g. aquose - adjective, technical, 
scientific, probably something to do with water, e.g. full of water. To such a 
question the answer was given that the word is known and awarded and one mark 
was scored. Two points of interest arise here. Firstly, when does this degree of 
lexical self-confidence and sophistication begin to appear, because there can be little 
doubt that once it does a learner's vocabulary increases very rapidly bringing vith 
it obvious results in such skills as reading. Secondly, is it possible to bring language 
learners to this point more rapidly than at present by pedagogic means, i.e. by 
making students increasingly 'root conscious'. 

The tests presented here and the results that they have produced must, therefore, 
be considered within the wider context of vocabulary testing in general where they 
can, in fact, play a useful although limited role as a part of a larger battery of tests. 
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Appendix 
Questionnaire 

I would be pleased if you would kindly complete the following questionnaire and 
the three attached vocabulary tests. 

All the information will be kept strictly confidential and ased for statistical 
purposes only. 

Please circle the appropriate word. 

1. Sex: Male Female 

2. Age: 20+ 30+ 40+ 50+ 

3. Mother tongue: English Chinese Other 

4. If "Other" in "3" above, please state your native language 



5. Qualifications: Doctor, e.g. Ph.D. 

Master, e.g. M.A. 
Bachelor, e.g. B.A. 
Secondary School Certificate 
Other (please state) 

6. If you arc not a native speaker of English, please state the number of years you 
have used English, including your peri ads of studying the language. 

1+ 5+ 10+ 15+ 20+ 25+ yrs 

Tick the words you know. Add the number of ticks and give the total at the end 
of each test. 



Test I* 



1 bag 

2 face 

3 entire 

4 approve 

5 tap 

6 jersey 

7 cavalry 

8 mortgage 

9 homage 

10 colleague 



18 marquise 

19 stationery 

20 woodsman 



1 1 avalanche 

12 firmament 

1 3 shrew 



14 atrophy 

15 broach 

16 con 



17 halloo 



21 bastinado 

22 countermarch 

23 furbish 

24 meerschaum 

25 patroon 

26 regatta 

27 asphyxiate 

28 curricle 

29 wcta 

30 biocnvironmcntal 
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31 detente 

32 draconic 

33 glaucoma 

34 morph 

35 permutate 

36 thingamabob 

37 piss 

38 brazenfaced 

39 loquat 

40 anthelmintic 



41 gamp 

42 paraprotein 

43 heterophyllous 

44 squirearch 

45 resorb - 

46 goldenhair 

47 axbreaker 

48 masonite 

49 hematoid 

50 polybrid 



Test 2 



1 


ball 


2 


dead 


3 


loose 


4 


royal 


5 


stomach 


6 


veil 


7 


screw 


8 


fee 


9 


mask 


10 beak 



31 copulate 

32 paradigm 

33 cadge 

34 aquaplane 

35 antiphon 

36 acrostical 

37 shimmy 

38 pomander 

39 basquclcss 

40 parameter 



11 idol 

12 tributary 

13 potential 

14 swine 

15 chink 

16 stationary 

17 ooze 

18 terminal 

19 filth 

20 surf 

41 siurry 

42 ska 

43 prunella 

44 glycerosc 
i5 pessimum 

46 remanence 

47 rhinocerotid 
4P secant 

49 minikin 

50 tansy 



21 tentacle 

22 dynamo 

23 whiff 

24 disruption 

25 misdemeanour 

26 libertine 

27 mastication 

28 complicity 

29 plethora 

30 argent 



1 [) 
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Test 3 



1 day 

2 fear 
memory 
plane 
female 
twist 
breeze 

8 pluck 

9 jam 

10 pulse 

31 parse 

32 spew 

33 re°inal 

34 aureate 

35 corral 

36 debrief 

37 sinistral 

38 ablation 

39 carriole 

40 parhelion 



11 bluff 

12 hesitation 

13 exit 

14 treatise 

15 feverishly 

16 sill 

17 giggle 

18 swerve 

19 clod 

20 innovator 

41 possum 

42 scarification 

43 muscatel 

44 aquose 

45 erythracmia 

46 pelagian 

47 irredentist 

48 dowcral 

49 helvellic 

50 farinulcnt 



L' patriarch 

22 tumour 

23 barb 

24 whetstone 

25 chamois 

26 sty 

27 hydraulics 

28 addle 

29 tactile 

30 flageolet 



♦Gouldcn, R., Nation P., and Read, J. (1W0). How large can a receptiv, 
vocabulary be? (Test 2) Applied Linguistics, 1 1(4):3W. 
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